What is Data Science?


In [2]:
from IPython.display import Image, display

Drew Conway


In [3]:
Image("images/drew_conway_venn.png", width=400)


Out[3]:

@BigDataBorat


In [4]:
Image("images/bigdataborat_venn.png", width=400)


Out[4]:

Someone on the internet

Maybe...

A Data Scientist is a statistician who lives in San Francisco

...or...

Data Science is statistics on a Mac

...or...

A Data Scientist is someone who is better at statistics than any software engineer and better at software eigneering than any statistician.

Hilary Mason

A data scientist is someone who can obtain, scrub, explore, model and interpret data, blending hacking, statistics and machine learning. Data scientists not only are adept at working with data, but appreciate data itself as a first-class product

All scientists

Hey wait, haven't we been doing data science of hundreds of years?

Tool usage as a proxy

The 2013 O'Reilly Data Science Salary Survey: Tools, Trends, What Pays (and What Doesn’t) for Data Professionals.

  • Salaries positively correlated with the number of tools used by respondents.
  • Respondents selecting tools from the open source cluster had higher salaries than respondents selecting commercial tools.
  • Usage of R and Python usage is positively correlated.

In [5]:
Image("images/tool_usage.png", width=600)


Out[5]:

data SCIENCE

What is scientific about data science? Here is my own take:

Data science involves the application of scientific methodologies to data sets that lie outside the traditional realms of science.